Skip to content

Fixed creating ingresses without admin access #407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 20, 2023

Conversation

Bobbins228
Copy link
Contributor

@Bobbins228 Bobbins228 commented Nov 17, 2023

Issue link

Closes #400

What changes have been made

Notable changes:

  1. Removed the use of getting the ingress domain from config.openshift.io.ingresses and opted for the creation of routes when on OpenShift.
  2. Fixed the creation of ingresses when setting mcad=False in cluster configuration
  3. On OpenShift users must pass an ingress_domain when using the ray client. (this is mandatory as we need the domain name when creating the cert command in the yaml see here
  4. We now use the discovery api to discern whether the user is on Kind or OpenShift by searching for the route.openshift.io api group. (This should resolve cluster level privileges issue)

Verification steps

  1. Run through a notebook on Kind & OpenShift and verify ingresses/routes are created properly
  2. Run through a notebook on Kind with mcad=False to verify ingresses are created/deleted

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Copy link
Member

@tedhtchang tedhtchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I think. Just small nit. See comment.

Perhaps in separate issue I think we should explore having the SDK create Route CRs directly (without using a ingresss) if is_openshift true so the OCP users don't need to provide domain_name.

Copy link
Contributor

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments, otherwise LGTM.

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Nov 28, 2023
Copy link
Member

@tedhtchang tedhtchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my OpenShift Cluster. I run the following from notebook

# Create our cluster and submit appwrapper
namespace = "default"
cluster_name = "hfgputest-1"
local_interactive = True

cluster = Cluster(ClusterConfiguration(local_interactive=local_interactive,
                                       namespace=namespace,
                                       name=cluster_name,
                                       num_workers=1,
                                       min_cpus=1,
                                       max_cpus=1,
                                       min_memory=4,
                                       max_memory=4,
                                       num_gpus=0,
                                       image="quay.io/project-codeflare/ray:latest-py39-cu118",
                                       instascale=False,
                                       machine_types=["m5.xlarge", "p3.8xlarge"],
                                        head_cpus=1,
                                       head_memory=2,
                                      ))

and got this error:

File ~/codeflare-sdk/src/codeflare_sdk/utils/generate_yaml.py:434, in enable_local_interactive(resources, cluster_name, namespace, ingress_domain)
    431 command = command.replace("deployment-name", cluster_name)
    433 if ingress_domain is None:
--> 434     raise ValueError(
    435         "ingress_domain is invalid. For creating the client route/ingress please specify an ingress domain"
    436     )
    437 else:
    438     domain = ingress_domain

ValueError: ingress_domain is invalid. For creating the client route/ingress please specify an ingress domain

For OpenShift cluster, we shouldn't need to provide ingress_domain to update Route. Is this intended ?

@Bobbins228
Copy link
Contributor Author

@tedhtchang
Hi Ted this is expected behaviour as we no longer gather the domain from the cluster object we need to pass the domain for the init container command.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2023
Copy link
Contributor

openshift-ci bot commented Dec 5, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: astefanutti, tedhtchang
Once this PR has been reviewed and has the lgtm label, please ask for approval from bobbins228. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@astefanutti
Copy link
Contributor

astefanutti commented Dec 6, 2023

@Bobbins228, as @tedhtchang pointed out, I'd assume it's not required for users to provide the domain on OpenShift. The host/domain is automatically generated for Route if not provided, and can be retrieved from the Route resource once it got admitted.

@Bobbins228
Copy link
Contributor Author

@astefanutti Yes this is true.
We are following this approach for both the dashboard and the client routes. The reason why it is asked for the client route is for the create-cert init container which requires the domain name in place of the server-name.

Before we were gathering the domain through that Cluster instance but it has to be removed now.

Copy link
Contributor

@Fiona-Waters Fiona-Waters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run through the local_interactive notebook on a hypershift/openshift cluster. Everything runs fine. Both routes and Ingresses are created - is this expected?

@Bobbins228
Copy link
Contributor Author

@Fiona-Waters

I've run through the local_interactive notebook on a hypershift/openshift cluster. Everything runs fine. Both routes and Ingresses are created - is this expected?

Only routes should be created. Just tested it out and only routes were created on my end.
May I see a snippet of the ingress/routes in the yaml file created by the sdk?

@Fiona-Waters
Copy link
Contributor

@Fiona-Waters

I've run through the local_interactive notebook on a hypershift/openshift cluster. Everything runs fine. Both routes and Ingresses are created - is this expected?

Only routes should be created. Just tested it out and only routes were created on my end. May I see a snippet of the ingress/routes in the yaml file created by the sdk?

Sure.

- generictemplate:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        metadata:
          name: ray-dashboard-hfgputest-1
          namespace: default
        spec:
          rules:
          - host: ray-dashboard-hfgputest-1-default.apps.rosa.fwaters.vqoo.p3.openshiftapps.com
            http:
              paths:
              - backend:
                  service:
                    name: hfgputest-1-head-svc
                    port:
                      number: 8265
                path: /
                pathType: Prefix
      replicas: 1
    - generictemplate:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        metadata:
          annotations:
            nginx.ingress.kubernetes.io/rewrite-target: /
            nginx.ingress.kubernetes.io/ssl-redirect: 'true'
            route.openshift.io/termination: passthrough
          labels:
            odh-ray-cluster-service: hfgputest-1-head-svc
          name: rayclient-hfgputest-1
          namespace: default
        spec:
          ingressClassName: openshift-default
          rules:
          - host: rayclient-hfgputest-1-default.apps.rosa.fwaters.vqoo.p3.openshiftapps.com
            http:
              paths:
              - backend:
                  service:
                    name: hfgputest-1-head-svc
                    port:
                      number: 10001
                path: ''
                pathType: ImplementationSpecific

@Fiona-Waters
Copy link
Contributor

@Fiona-Waters

I've run through the local_interactive notebook on a hypershift/openshift cluster. Everything runs fine. Both routes and Ingresses are created - is this expected?

Only routes should be created. Just tested it out and only routes were created on my end. May I see a snippet of the ingress/routes in the yaml file created by the sdk?

Sure.

- generictemplate:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        metadata:
          name: ray-dashboard-hfgputest-1
          namespace: default
        spec:
          rules:
          - host: ray-dashboard-hfgputest-1-default.apps.rosa.fwaters.vqoo.p3.openshiftapps.com
            http:
              paths:
              - backend:
                  service:
                    name: hfgputest-1-head-svc
                    port:
                      number: 8265
                path: /
                pathType: Prefix
      replicas: 1
    - generictemplate:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        metadata:
          annotations:
            nginx.ingress.kubernetes.io/rewrite-target: /
            nginx.ingress.kubernetes.io/ssl-redirect: 'true'
            route.openshift.io/termination: passthrough
          labels:
            odh-ray-cluster-service: hfgputest-1-head-svc
          name: rayclient-hfgputest-1
          namespace: default
        spec:
          ingressClassName: openshift-default
          rules:
          - host: rayclient-hfgputest-1-default.apps.rosa.fwaters.vqoo.p3.openshiftapps.com
            http:
              paths:
              - backend:
                  service:
                    name: hfgputest-1-head-svc
                    port:
                      number: 10001
                path: ''
                pathType: ImplementationSpecific

The above issue was a local issue. I can confirm that I was able to successfully run through the local interactive notebook on OpenShift - with routes being successfully created (when an ingress_domain is provided) and on KinD with ingresses being successfully created.
/lgtm

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2023
@Bobbins228 Bobbins228 mentioned this pull request Dec 13, 2023
4 tasks
@Fiona-Waters
Copy link
Contributor

In trying to recreate the issue reported here I found without the changes in this PR that the ray dashboard route is not created correctly and does not allow access to the dashboard. With these changes it is created and can be accessed. I think that we should merge this PR and continue improving the overall area by working on each of the other related issues that follow on from here.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2023
@astefanutti
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2023
@astefanutti
Copy link
Contributor

Let's merge it as it improves things significantly already. We can address the remaining points separately.

@astefanutti astefanutti merged commit dce93f7 into project-codeflare:main Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster admin permissions needed for creating ingresses
4 participants